Telco Customer Churn Prediction 😁😊🙁😠😡

¶

1. Problem statement
2. Import Libraries and Data
3. Handling Missing Values
4. Data Analysis and Visualization
5. Outlier Detection
6. Check for Rare Categories
7. Categorical Variables Encoding
8. Balance Data
9. Dataset Splitting
10. Feature Scaling
11. Modeling and Parameter Optimization
12. Feature Importance
13. Results

¶

1. Problem Statement

¶

1.1. Introduction

What is Customer Churn?

Customer churn is the percentage of customers that stopped using company's product or service during a certain time frame. Customer churn is one of the most important metrics for a growing business to evaluate as it is much less expensive to retain existing customers than it is to acquire new customers. Customers in the telecom industry can choose from a variety of service providers and actively switch from one to the next. The telecommunications business has an annual churn rate of 15-25 percent in this highly competitive market.

Customer churn is extremley costly for companies. Based on a churn rate just under two percent for top companies, one source estimates carriers lose $65 million per month from churn. To reduce customer churn, telecom companies should predict which customers are highly prone to churn.

Individualized customer retention is demanding because most companies have a large number of customers and cannot afford to devote much time to each of them. The costs would be too great, outweighing the additional revenue. However, if a corporation could forecast which customers are likely to leave ahead of time, it could concentrate customer retention efforts only on these "high risk" clients.

¶

1.2. Obejctives

In this projects below questions will be answered:

What's the $\%$ of Customers Churn and customers that keep in with the active services?
Is there any patterns in Customers Churn based on the gender?
Is there any patterns/preference in Customers Churn based on the type of service provided?
What's the most profitable service types?
Which features and services are most profitable?
Which features have the most impact on predicting customers churn?
Which model is the best for predicting churn?

¶

1.3. Dataset Features

Customer ID: A unique ID that identifies each customer.

Demographic info about customers:

gender: Whether the customer is a male or a female
SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
Partner: Whether the customer has a partner or not (Yes, No)
Dependents: Whether the customer has dependents or not (Yes, No)

Services that each customer has signed up for:

PhoneService: Whether the customer has a phone service or not (Yes, No)
MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service)
DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)

Customer account information:

tenure: Number of months the customer has stayed with the company
Contract: The contract term of the customer (Month-to-month, One year, Two year)
PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
MonthlyCharges: The amount charged to the customer monthly
TotalCharges: The total amount charged to the customer
Churn: Target, Whether the customer has left within the last month or not (Yes or No)

¶

2. Import Libraries and Data

Back to Table of Contents

In [46]:

!pip install mlens

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: mlens in /usr/local/lib/python3.7/dist-packages (0.2.3)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from mlens) (1.21.6)
Requirement already satisfied: scipy>=0.17 in /usr/local/lib/python3.7/dist-packages (from mlens) (1.4.1)

In [47]:

# handle table-like data and matrices
import pandas as pd
import numpy as np

# visualisation
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
init_notebook_mode(connected=True)

# preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score

# balance data
from imblearn.over_sampling import BorderlineSMOTE

# models
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, ExtraTreesClassifier, StackingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from mlens.ensemble import SuperLearner
from sklearn.neural_network import MLPClassifier

# evaluations
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report, roc_auc_score, plot_roc_curve, roc_curve, auc
from sklearn.model_selection import StratifiedKFold, RandomizedSearchCV

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

# to display the total number columns present in the dataset
pd.set_option('display.max_columns', None)

In [48]:

data = pd.read_csv('Telco Customer Churn.csv')

¶

3. Handling Missing Values

Back to Table of Contents

let's find if we have missing values in the dataset.

In [49]:

data = data.replace(r'^\s*$', np.nan, regex=True)

In [50]:

data.isnull().sum()

Out[50]:

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

In [51]:

msno.matrix(data);

If we examine the data carefully, we can actually estimate the value of the missing data.

Contract length in month * tenure (if not 0) * monthly charges

This is more accurate than filling missing values with mean or median.

In [52]:

data[data['TotalCharges'].isnull()].index.tolist()

Out[52]:

[488, 753, 936, 1082, 1340, 3331, 3826, 4380, 5218, 6670, 6754]

In [53]:

ind = data[data['TotalCharges'].isnull()].index.tolist()
for i in ind:
  if data['Contract'].iloc[i,] == 'Two year':
    data['TotalCharges'].iloc[i,] = int(np.maximum(data['tenure'].iloc[i,], 1)) * data['MonthlyCharges'].iloc[i,] * 24
  elif data['Contract'].iloc[i,] == 'One year':
    data['TotalCharges'].iloc[i,] = int(np.maximum(data['tenure'].iloc[i,], 1)) * data['MonthlyCharges'].iloc[i,] * 12
  else:
    data['TotalCharges'].iloc[i,] = int(np.maximum(data['tenure'].iloc[i,], 1)) * data['MonthlyCharges'].iloc[i,]

In [54]:

data.isnull().sum()

Out[54]:

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

let's find if we have duplicate rows.

In [55]:

data.duplicated().sum()

Out[55]:

¶

4. Data Analysis and Visualization

Back to Table of Contents

In [56]:

data.head(3)

Out[56]:

	customerID	gender	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	OnlineBackup	DeviceProtection	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn
0	7590-VHVEG	Female	Yes	No	1	No	No phone service	DSL	No	Yes	No	No	No	No	Month-to-month	Yes	Electronic check	29.85	29.85	No
1	5575-GNVDE	Male	No	No	34	Yes	No	DSL	Yes	No	Yes	No	No	No	One year	No	Mailed check	56.95	1889.5	No
2	3668-QPYBK	Male	No	No	2	Yes	No	DSL	Yes	Yes	No	No	No	No	Month-to-month	Yes	Mailed check	53.85	108.15	Yes

In [57]:

data.shape

Out[57]:

(7043, 21)

There are 7043 cutomers and 21 features in the dataset.

In [58]:

for i in data.columns[6:-3]:
  print(f'Number of categories in the variable {i}: {len(data[i].unique())}')

Number of categories in the variable PhoneService: 2
Number of categories in the variable MultipleLines: 3
Number of categories in the variable InternetService: 3
Number of categories in the variable OnlineSecurity: 3
Number of categories in the variable OnlineBackup: 3
Number of categories in the variable DeviceProtection: 3
Number of categories in the variable TechSupport: 3
Number of categories in the variable StreamingTV: 3
Number of categories in the variable StreamingMovies: 3
Number of categories in the variable Contract: 3
Number of categories in the variable PaperlessBilling: 2
Number of categories in the variable PaymentMethod: 4

In [59]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 
 17  PaymentMethod     7043 non-null   object 
 18  MonthlyCharges    7043 non-null   float64
 19  TotalCharges      7043 non-null   object 
 20  Churn             7043 non-null   object 
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

In [60]:

data.describe()

Out[60]:

	SeniorCitizen	tenure	MonthlyCharges
count	7043.000000	7043.000000	7043.000000
mean	0.162147	32.371149	64.761692
std	0.368612	24.559481	30.090047
min	0.000000	0.000000	18.250000
25%	0.000000	9.000000	35.500000
50%	0.000000	29.000000	70.350000
75%	0.000000	55.000000	89.850000
max	1.000000	72.000000	118.750000

In [61]:

data.describe(include=object).T

Out[61]:

	count	unique	top	freq
customerID	7043	7043	7590-VHVEG	1
gender	7043	2	Male	3555
Partner	7043	2	No	3641
Dependents	7043	2	No	4933
PhoneService	7043	2	Yes	6361
MultipleLines	7043	3	No	3390
InternetService	7043	3	Fiber optic	3096
OnlineSecurity	7043	3	No	3498
OnlineBackup	7043	3	No	3088
DeviceProtection	7043	3	No	3095
TechSupport	7043	3	No	3473
StreamingTV	7043	3	No	2810
StreamingMovies	7043	3	No	2785
Contract	7043	3	Month-to-month	3875
PaperlessBilling	7043	2	Yes	4171
PaymentMethod	7043	4	Electronic check	2365
TotalCharges	7043	6541	20.2	11
Churn	7043	2	No	5174

In [62]:

fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['gender'].unique(), values=data['gender'].value_counts(), name='Gender', 
                     marker_colors=['gold', 'mediumturquoise']), 1, 1)
fig.add_trace(go.Pie(labels=data['Churn'].unique(), values=data['Churn'].value_counts(), name='Churn', 
                     marker_colors=['darkorange', 'lightgreen']), 1, 2)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Gender and Churn Distributions<b>', 
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Gender', x=0.19, y=0.5, font_size=20, showarrow=False),
                 dict(text='Churn', x=0.8, y=0.5, font_size=20, showarrow=False)])
iplot(fig)

We have imbalanced data.
$26.6 \%$ of customers switched to another company.
Customers are $49.5 \%$ female and $50.5 \%$ male.

In [63]:

fig = px.sunburst(data, path=['Churn', 'gender'], title='<b>Sunburst Plot of Gender and churn<b>')
iplot(fig)

In [64]:

print(f'A female customer has a probability of {round(data[(data["gender"] == "Female") & (data["Churn"] == "Yes")].count()[0] / data[(data["gender"] == "Female")].count()[0] *100,2)} % churn')

print(f'A male customer has a probability of {round(data[(data["gender"] == "Male") & (data["Churn"] == "Yes")].count()[0] / data[(data["gender"] == "Male")].count()[0]*100,2)} % churn')

A female customer has a probability of 26.92 % churn
A male customer has a probability of 26.16 % churn

There is negligible difference in customer percentage who changed the service provider. Both genders behaved in similar way when it comes to migrating to another service provider.

In [65]:

fig = px.histogram(data, x='Churn', color='Contract', barmode='group', title='<b>Customer Contract Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#EC7063','#E9F00B','#0BF0D1'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [66]:

print(f'A customer with month-to-month contract has a probability of {round(data[(data["Contract"] == "Month-to-month") & (data["Churn"] == "Yes")].count()[0] / data[(data["Contract"] == "Month-to-month")].count()[0] *100,2)} % churn')

print(f'A customer with one year contract has a probability of {round(data[(data["Contract"] == "One year") & (data["Churn"] == "Yes")].count()[0] / data[(data["Contract"] == "One year")].count()[0]*100,2)} % churn')

print(f'A customer with two year contract has a probability of {round(data[(data["Contract"] == "Two year") & (data["Churn"] == "Yes")].count()[0] / data[(data["Contract"] == "Two year")].count()[0]*100,2)} % churn')

A customer with month-to-month contract has a probability of 42.71 % churn
A customer with one year contract has a probability of 11.27 % churn
A customer with two year contract has a probability of 2.83 % churn

About $43\%$ of customer with Month-to-Month Contract opted to move out as compared to $11\%$ of customrs with One Year Contract and $3\%$ with Two Year Contract. A major percent of people who left the comapny had Month-to-Month Contract. This is acutually logical since people who have long-term contract are more loyal to the company.

In [67]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['PaymentMethod'].unique(), values=data['PaymentMethod'].value_counts(), name='Payment Method',
                     marker_colors=['gold', 'mediumturquoise','darkorange', 'lightgreen']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Payment Method Distributions<b>', 
    annotations=[dict(text='Payment Method', x=0.5, y=0.5, font_size=18, showarrow=False)])
iplot(fig)

In [68]:

fig = px.histogram(data, x='Churn', color='PaymentMethod', barmode='group', title='<b>Payment Method Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#EC7063', '#0BF0D1', '#E9F00B', '#5DADE2'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [69]:

print(f'A customer that use Electronic check for paying has a probability of {round(data[(data["PaymentMethod"] == "Electronic check") & (data["Churn"] == "Yes")].count()[0] / data[(data["PaymentMethod"] == "Electronic check")].count()[0] *100,2)} % churn')

print(f'A customer that use Mailed check for paying has a probability of {round(data[(data["PaymentMethod"] == "Mailed check") & (data["Churn"] == "Yes")].count()[0] / data[(data["PaymentMethod"] == "Mailed check")].count()[0]*100,2)} % churn')

print(f'A customer that use Bank transfer (automatic) for paying has a probability of {round(data[(data["PaymentMethod"] == "Bank transfer (automatic)") & (data["Churn"] == "Yes")].count()[0] / data[(data["PaymentMethod"] == "Bank transfer (automatic)")].count()[0]*100,2)} % churn')

print(f'A customer that use Credit card (automatic) for paying has a probability of {round(data[(data["PaymentMethod"] == "Credit card (automatic)") & (data["Churn"] == "Yes")].count()[0] / data[(data["PaymentMethod"] == "Credit card (automatic)")].count()[0]*100,2)} % churn')

A customer that use Electronic check for paying has a probability of 45.29 % churn
A customer that use Mailed check for paying has a probability of 19.11 % churn
A customer that use Bank transfer (automatic) for paying has a probability of 16.71 % churn
A customer that use Credit card (automatic) for paying has a probability of 15.24 % churn

Major customers who moved out had Electronic Check as Payment Method.
Customers who chose Credit-Card automatic transfer or Bank Automatic Transfer and Mailed Check as Payment Method were less likely to move out.

In [70]:

data[data['gender']=='Male'][['InternetService', 'Churn']].value_counts()

Out[70]:

InternetService  Churn
DSL              No       993
Fiber optic      No       910
No               No       722
Fiber optic      Yes      633
DSL              Yes      240
No               Yes       57
dtype: int64

In [71]:

data[data['gender']=='Female'][['InternetService', 'Churn']].value_counts()

Out[71]:

InternetService  Churn
DSL              No       969
Fiber optic      No       889
No               No       691
Fiber optic      Yes      664
DSL              Yes      219
No               Yes       56
dtype: int64

In [72]:

fig = go.Figure()

fig.add_trace(go.Bar(
  x = [['Churn:No', 'Churn:No', 'Churn:Yes', 'Churn:Yes'],
       ['Female', 'Male', 'Female', 'Male']],
  y = [965, 992, 219, 240],
  name = 'DSL', 
))

fig.add_trace(go.Bar(
  x = [['Churn:No', 'Churn:No', 'Churn:Yes', 'Churn:Yes'],
       ['Female', 'Male', 'Female', 'Male']],
  y = [889, 910, 664, 633],
  name = 'Fiber optic',
))

fig.add_trace(go.Bar(
  x = [['Churn:No', 'Churn:No', 'Churn:Yes', 'Churn:Yes'],
       ['Female', 'Male', 'Female', 'Male']],
  y = [690, 717, 56, 57],
  name = 'No Internet',
))

fig.update_layout(title_text='<b>Churn Distribution w.r.t. Internet Service and Gender</b>')
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

A lot of customers choose the Fiber optic service and it's also evident that the customers who use Fiber optic have high churn rate, this might suggest a dissatisfaction with this type of internet service.
Customers having DSL service are majority in number and have less churn rate compared to Fibre optic service.

In [73]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['Dependents'].unique(), values=data['Dependents'].value_counts(), name='Dependents',
                     marker_colors=['#E5527A ', '#AAB7B8']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Dependents Distribution<b>', 
    annotations=[dict(text='Dependents', x=0.5, y=0.5, font_size=18, showarrow=False)])
iplot(fig)

In [74]:

fig = px.histogram(data, x='Dependents', color='Churn', barmode='group', title='<b>Dependents Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#00CC96','#FFA15A'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [75]:

print(f'A customer with dependents has a probability of {round(data[(data["Dependents"] == "Yes") & (data["Churn"] == "Yes")].count()[0] / data[(data["Dependents"] == "Yes")].count()[0] *100,2)} % churn')

print(f'A customer without dependents has a probability of {round(data[(data["Dependents"] == "No") & (data["Churn"] == "Yes")].count()[0] / data[(data["Dependents"] == "No")].count()[0]*100,2)} % churn')

A customer with dependents has a probability of 15.45 % churn
A customer without dependents has a probability of 31.28 % churn

Customers without dependents are more likely to churn

In [76]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['Partner'].unique(), values=data['Partner'].value_counts(), name='Partner',
                     marker_colors=['gold', 'purple']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Partner Distribution<b>', 
    annotations=[dict(text='Partner', x=0.5, y=0.5, font_size=18, showarrow=False)])
iplot(fig)

In [77]:

fig = px.histogram(data, x='Churn', color='Partner', barmode='group', title='<b>Partner Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#C82735','#BCC827'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [78]:

print(f'A customer with a partner has a probability of {round(data[(data["Partner"] == "Yes") & (data["Churn"] == "Yes")].count()[0] / data[(data["Partner"] == "Yes")].count()[0] *100,2)} % churn')

print(f'A customer without a partner has a probability of {round(data[(data["Partner"] == "No") & (data["Churn"] == "Yes")].count()[0] / data[(data["Partner"] == "No")].count()[0]*100,2)} % churn')

A customer with a partner has a probability of 19.66 % churn
A customer without a partner has a probability of 32.96 % churn

Customers that doesn't have partners are more likely to churn

In [79]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=['No', 'Yes'], values=data['SeniorCitizen'].value_counts(), name='Senior Citizen',
                     marker_colors=['#56E11A', '#1A87E1']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Senior Citizen Distribution<b>', 
    annotations=[dict(text='Senior Citizen', x=0.5, y=0.5, font_size=18, showarrow=False)])
iplot(fig)

In [80]:

fig = px.histogram(data, x='Churn', color='SeniorCitizen', barmode='group', title='<b>Senior Citizen Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#E11AC6','#BAE11A'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [81]:

print(f'A customer that is a senior citizen has a probability of {round(data[(data["SeniorCitizen"] == 1) & (data["Churn"] == "Yes")].count()[0] / data[(data["SeniorCitizen"] == 1)].count()[0] *100,2)} % churn')

print(f'A customer that is not a senior citizen has a probability of {round(data[(data["SeniorCitizen"] == 0) & (data["Churn"] == "Yes")].count()[0] / data[(data["SeniorCitizen"] == 0)].count()[0]*100,2)} % churn')

A customer that is a senior citizen has a probability of 41.68 % churn
A customer that is not a senior citizen has a probability of 23.61 % churn

It can be observed that the fraction of senior citizen is very less.
About $42\%$ of the senior citizens churn.

In [82]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['OnlineSecurity'].unique(), values=data['OnlineSecurity'].value_counts(), name='OnlineSecurity',
                     marker_colors=['#1AE178', '#2CECE6', 'red']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Online Security Distribution<b>', 
    annotations=[dict(text='Online Security', x=0.5, y=0.5, font_size=18, showarrow=False)])
iplot(fig)

In [83]:

fig = px.histogram(data, x='Churn', color='OnlineSecurity', barmode='group', title='<b>Online Security Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#EB984E','yellow', '#5499C7'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [84]:

print(f'A customer with an online security has a probability of {round(data[(data["OnlineSecurity"] == "Yes") & (data["Churn"] == "Yes")].count()[0] / data[(data["OnlineSecurity"] == "Yes")].count()[0] *100,2)} % churn')

print(f'A customer without an online security has a probability of {round(data[(data["OnlineSecurity"] == "No") & (data["Churn"] == "Yes")].count()[0] / data[(data["OnlineSecurity"] == "No")].count()[0]*100,2)} % churn')

print(f'A customer with no internet service has a probability of {round(data[(data["OnlineSecurity"] == "No internet service") & (data["Churn"] == "Yes")].count()[0] / data[(data["OnlineSecurity"] == "No internet service")].count()[0]*100,2)} % churn')

A customer with an online security has a probability of 14.61 % churn
A customer without an online security has a probability of 41.77 % churn
A customer with no internet service has a probability of 7.4 % churn

Most customers churn in the absence of online security.

In [85]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['PaperlessBilling'].unique(), values=data['PaperlessBilling'].value_counts(), name='PaperlessBilling',
                     marker_colors=['LightCoral', '#CCCCFF']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>PaperlessBilling Distribution<b>', 
    annotations=[dict(text='PaperlessBilling Security', x=0.5, y=0.5, font_size=14, showarrow=False)])
iplot(fig)

In [86]:

fig = px.histogram(data, x='Churn', color='PaperlessBilling', barmode='group', title='<b>Paperless Billing Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#9FE2BF', '#FF7F50'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [87]:

print(f'A customer with PaperlessBilling has a probability of {round(data[(data["PaperlessBilling"] == "Yes") & (data["Churn"] == "Yes")].count()[0] / data[(data["PaperlessBilling"] == "Yes")].count()[0] *100,2)} % churn')

print(f'A customer without PaperlessBilling has a probability of {round(data[(data["PaperlessBilling"] == "No") & (data["Churn"] == "Yes")].count()[0] / data[(data["PaperlessBilling"] == "No")].count()[0]*100,2)} % churn')

A customer with PaperlessBilling has a probability of 33.57 % churn
A customer without PaperlessBilling has a probability of 16.33 % churn

Customers with Paperless Billing are most likely to churn.

In [88]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['TechSupport'].unique(), values=data['TechSupport'].value_counts(), name='TechSupport',
                     marker_colors=['#DE3163', '#DFFF00', '#40E0D0']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>TechSupport Distribution<b>', 
    annotations=[dict(text='Tech Support', x=0.5, y=0.5, font_size=18, showarrow=False)])
iplot(fig)

In [89]:

fig = px.histogram(data, x='Churn', color='TechSupport', barmode='group', title='<b>Tech Support Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#FFBF00', 'IndianRed', 'red'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [90]:

print(f'A customer with a tech support has a probability of {round(data[(data["TechSupport"] == "Yes") & (data["Churn"] == "Yes")].count()[0] / data[(data["TechSupport"] == "Yes")].count()[0] *100,2)} % churn')

print(f'A customer without a tech support has a probability of {round(data[(data["TechSupport"] == "No") & (data["Churn"] == "Yes")].count()[0] / data[(data["TechSupport"] == "No")].count()[0]*100,2)} % churn')

print(f'A customer with no internet service has a probability of {round(data[(data["TechSupport"] == "No internet service") & (data["Churn"] == "Yes")].count()[0] / data[(data["TechSupport"] == "No internet service")].count()[0]*100,2)} % churn')

A customer with a tech support has a probability of 15.17 % churn
A customer without a tech support has a probability of 41.64 % churn
A customer with no internet service has a probability of 7.4 % churn

Customers with no TechSupport are most likely to migrate to another service provider.

In [91]:

fig = make_subplots(rows=1, cols=1, specs=[[{'type':'domain'}]])

fig.add_trace(go.Pie(labels=data['PhoneService'].unique(), values=data['PhoneService'].value_counts(), name='PhoneService',
                     marker_colors=['LightSalmon', '#7FB3D5']), 1, 1)

fig.update_traces(hole=0.5, textfont_size=20, marker=dict(line=dict(color='black', width=2)))

fig.update_layout(
    title_text='<b>Phone Service Distribution<b>', 
    annotations=[dict(text='Phone Service', x=0.5, y=0.5, font_size=20, showarrow=False)])
iplot(fig)

In [92]:

fig = px.histogram(data, x='Churn', color='PhoneService', barmode='group', title='<b>Phone Service Distribution w.r.t. Churn<b>', 
                   color_discrete_sequence = ['#FFBF00', 'IndianRed'], text_auto=True)

fig.update_layout(width=1100, height=500, bargap=0.3)
fig.update_traces(marker_line_width=2,marker_line_color='black')

iplot(fig)

In [93]:

print(f'A customer with phone service has a probability of {round(data[(data["PhoneService"] == "Yes") & (data["Churn"] == "Yes")].count()[0] / data[(data["PhoneService"] == "Yes")].count()[0] *100,2)} % churn')

print(f'A customer without phone service has a probability of {round(data[(data["PhoneService"] == "No") & (data["Churn"] == "Yes")].count()[0] / data[(data["PhoneService"] == "No")].count()[0]*100,2)} % churn')

A customer with phone service has a probability of 26.71 % churn
A customer without phone service has a probability of 24.93 % churn

Very small fraction of customers don't have a phone service and out of that, about $25\%$ Customers are more likely to churn.

In [94]:

fig = px.histogram(data, x='MonthlyCharges', color='Churn', marginal='box', title='<b>Monthly Charges Distribution w.r.t. Churn<b>',
             color_discrete_sequence = ['#84D57F', '#C959DA'])
iplot(fig)

Customers with higher Monthly Charges are more likely to churn.

In [95]:

fig = px.histogram(data, x='TotalCharges', color='Churn', marginal='box', title='<b>Total Charges Distribution w.r.t. Churn<b>',
             color_discrete_sequence = ['blue', 'red'])
iplot(fig)

Customers with higher Total Charges are more likely to churn.

In [96]:

fig = px.histogram(data, x='tenure', color='Churn', marginal='box', title='<b>Tenure Distribution w.r.t. Churn<b>',
             color_discrete_sequence = ['orange', 'green'])
iplot(fig)

Customers who stayed with the company for longer time are more less likely to churn now.

¶

5. Outlier Detection

Back to Table of Contents

The presence of outliers in a classification or regression dataset can result in a poor fit and lower predictive modeling performance, therefore we should see there are ouliers in the data.

In [97]:

data=data.drop(labels=['customerID'],axis=1)

In [98]:

sns.distplot(data.TotalCharges);

In [99]:

sns.distplot(data.MonthlyCharges);

In [100]:

sns.distplot(data.tenure);

Another way of visualising outliers is using boxplots and whiskers, which provides the quantiles (box) and inter-quantile range (whiskers), with the outliers sitting outside the error bars (whiskers).

All the dots in the plot below are outliers according to the quantiles + 1.5 IQR rule

first let's specify the datatype of TotalCharges as numerical.

In [101]:

data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce')

In [102]:

fig = make_subplots(rows=1, cols=3)

fig.add_trace(go.Box(y=data['MonthlyCharges'], notched=True, name='Monthly Charges', marker_color = '#6699ff', 
                     boxmean=True, boxpoints='suspectedoutliers'), 1, 2)

fig.add_trace(go.Box(y=data['TotalCharges'], notched=True, name='Total Charges', marker_color = '#ff0066', 
                     boxmean=True, boxpoints='suspectedoutliers'), 1, 1)

fig.add_trace(go.Box(y=data['tenure'], notched=True, name='Tenure', marker_color = 'lightseagreen', 
                     boxmean=True, boxpoints='suspectedoutliers'), 1, 3)

fig.update_layout(title_text='<b>Box Plots for Numerical Variables<b>')

iplot(fig)

In [103]:

def detect_outliers(d):
  for i in d:
    Q3, Q1 = np.percentile(data[i], [75 ,25])
    IQR = Q3 - Q1

    ul = Q3+1.5*IQR
    ll = Q1-1.5*IQR

    outliers = data[i][(data[i] > ul) | (data[i] < ll)]
    print(f'*** {i} outlier points***', '\n', outliers, '\n')

In [104]:

detect_outliers(['tenure', 'MonthlyCharges', 'TotalCharges'])

*** tenure outlier points*** 
 Series([], Name: tenure, dtype: int64) 

*** MonthlyCharges outlier points*** 
 Series([], Name: MonthlyCharges, dtype: float64) 

*** TotalCharges outlier points*** 
 Series([], Name: TotalCharges, dtype: float64)

There is no outlier.

¶

6. Check for Rare Categories

Back to Table of Contents

Some categories may appear a lot in the dataset, whereas some other categories appear only in a few number of observations.

Rare values in categorical variables tend to cause over-fitting, particularly in tree based methods.
Rare labels may be present in training set, but not in test set, therefore causing over-fitting to the train set.
Rare labels may appear in the test set, and not in the train set. Thus, the machine learning model will not know how to evaluate it.

In [105]:

categorical = [var for var in data.columns if data[var].dtype=='O']

In [106]:

# check the number of different labels
for var in categorical:
    print(data[var].value_counts() / np.float(len(data)))
    print()
    print()

Male      0.504756
Female    0.495244
Name: gender, dtype: float64


No     0.516967
Yes    0.483033
Name: Partner, dtype: float64


No     0.700412
Yes    0.299588
Name: Dependents, dtype: float64


Yes    0.903166
No     0.096834
Name: PhoneService, dtype: float64


No                  0.481329
Yes                 0.421837
No phone service    0.096834
Name: MultipleLines, dtype: float64


Fiber optic    0.439585
DSL            0.343746
No             0.216669
Name: InternetService, dtype: float64


No                     0.496663
Yes                    0.286668
No internet service    0.216669
Name: OnlineSecurity, dtype: float64


No                     0.438450
Yes                    0.344881
No internet service    0.216669
Name: OnlineBackup, dtype: float64


No                     0.439443
Yes                    0.343888
No internet service    0.216669
Name: DeviceProtection, dtype: float64


No                     0.493114
Yes                    0.290217
No internet service    0.216669
Name: TechSupport, dtype: float64


No                     0.398978
Yes                    0.384353
No internet service    0.216669
Name: StreamingTV, dtype: float64


No                     0.395428
Yes                    0.387903
No internet service    0.216669
Name: StreamingMovies, dtype: float64


Month-to-month    0.550192
Two year          0.240664
One year          0.209144
Name: Contract, dtype: float64


Yes    0.592219
No     0.407781
Name: PaperlessBilling, dtype: float64


Electronic check             0.335794
Mailed check                 0.228880
Bank transfer (automatic)    0.219225
Credit card (automatic)      0.216101
Name: PaymentMethod, dtype: float64


No     0.73463
Yes    0.26537
Name: Churn, dtype: float64

As shown above, there is no rare category in the categorical variables.

¶

7. Categorical Variables Encoding

Back to Table of Contents

In [107]:

data['Churn'] = data['Churn'].map({'Yes':1,'No':0})

In [108]:

data.dtypes

Out[108]:

gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges        float64
Churn                 int64
dtype: object

This step is the key to achieve a high accuracy. We use Target guided ordinal encoding. Ordering the categories according to the target means assigning a number to the category, but this numbering, this ordering, is informed by the mean of the target within the category. Briefly, we calculate the mean of the target for each label/category, then we order the labels according to these mean from smallest to biggest, and we number them accordingly.

Advantages:

Capture information within the label, therefore rendering more predictive features
Create a monotonic relationship between the variable and the target
Does not expand the feature space

Disadvantage:

Prone to cause over-fitting

This process should be done on the train data, then the ordered label will be mapped into test data. (since the data is large enough, ordered categories will be same if we consider the whole data or just the train set.)

In [109]:

categorical = [var for var in data.columns if data[var].dtype=='O']

In [110]:

def category(df):
    for var in categorical:
        ordered_labels = df.groupby([var])['Churn'].mean().sort_values().index

        ordinal_label = {k:i for i, k in enumerate(ordered_labels, 0)} 
        ordinal_label
        df[var] = df[var].map(ordinal_label)

category(data)

In [111]:

data.head(5)

Out[111]:

	gender	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	OnlineBackup	DeviceProtection	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn
0	1	0	1	1	0	0	1	2	1	2	2	2	2	2	1	3	29.85	29.85	0
1	0	1	1	34	1	1	1	1	2	1	2	2	2	1	0	2	56.95	1889.50	0
2	0	1	1	2	1	1	1	1	1	2	2	2	2	2	1	2	53.85	108.15	1
3	0	1	1	45	0	0	1	1	2	1	1	2	2	1	0	1	42.30	1840.75	0
4	1	1	1	2	1	1	2	2	2	2	2	2	2	2	1	3	70.70	151.65	1

¶

8. Balance Data

Back to Table of Contents

In [112]:

fig = px.bar(x=data['Churn'].unique()[::-1], y=[data[data['Churn']==1].count()[0], data[data['Churn']==0].count()[0]],
       text=[np.round(data[data['Churn']==1].count()[0]/data.shape[0], 4), np.round(data[data['Churn']==0].count()[0]/data.shape[0], 4)]
       , color_discrete_sequence =['#ff9999'])

fig.update_layout(title_text='<b>Churn Count PLot<b>', xaxis = dict(tickmode = 'linear', tick0 = 0, dtick = 1),
                  width=700, height=400, bargap=0.4)

fig.update_layout({'yaxis': {'title':'Count'}, 'xaxis': {'title':'Churn'}})

iplot(fig)

As shown in the plot above, we are dealing with an imbalanced dataset. The BorderlineSMOTE method is used which involves selecting those instances of the minority class that are misclassified, such as with a k-nearest neighbor classification model. This method oversamples just those difficult instances, providing more resolution only where it may be required.

In [113]:

X = data.drop(['Churn'], axis = 1)
y = data['Churn']
oversample = BorderlineSMOTE()
X, y = oversample.fit_resample(X, y)

¶

9. Dataset Splitting

Back to Table of Contents

let's separate the data into training and testing set.

In [114]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
X_train.shape, X_test.shape

Out[114]:

((9313, 19), (1035, 19))

¶

10. Feature Scaling

Back to Table of Contents

In this section, numerical features are scaled.

StandardScaler = $\frac{x-\mu}{s}$

In [115]:

scaler = StandardScaler()
X_train[['TotalCharges','MonthlyCharges','tenure']] = scaler.fit_transform(X_train[['TotalCharges','MonthlyCharges','tenure']])
X_test[['TotalCharges','MonthlyCharges','tenure']] = scaler.transform(X_test[['TotalCharges','MonthlyCharges','tenure']]) 

¶

11. Modeling and Parameter Optimization

Back to Table of Contents

In [116]:

CV = StratifiedKFold(n_splits=10, random_state=0, shuffle=True)

Model 1 : LR

In [117]:

LR_S = LogisticRegression(random_state = 42)
params_LR = {'C': list(np.arange(1,12)), 'penalty': ['l2', 'elasticnet', 'none'], 'class_weight': ['balanced','None']}
grid_LR = RandomizedSearchCV(LR_S, param_distributions=params_LR, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_LR.fit(X_train, y_train)
print('Best parameters:', grid_LR.best_estimator_)

Best parameters: LogisticRegression(C=1, class_weight='None', random_state=42)

In [118]:

LR = LogisticRegression(random_state = 42, penalty= 'l2', class_weight= 'balanced', C=6)
cross_val_LR_Acc = cross_val_score(LR, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_LR_f1 = cross_val_score(LR, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_LR_AUC = cross_val_score(LR, X_train, y_train, cv = CV, scoring = 'roc_auc')

Model 2: Random Forest

In [119]:

RF_S = RandomForestClassifier(random_state = 42)
params_RF = {'n_estimators': list(range(50,100)), 'min_samples_leaf': list(range(1,5)), 'min_samples_split': list(range(1,5))}
grid_RF = RandomizedSearchCV(RF_S, param_distributions=params_RF, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_RF.fit(X_train, y_train)
print('Best parameters:', grid_RF.best_estimator_)

Best parameters: RandomForestClassifier(n_estimators=65, random_state=42)

In [120]:

RF = RandomForestClassifier(n_estimators=70, random_state=42)
cross_val_RF_Acc = cross_val_score(RF, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_RF_f1 = cross_val_score(RF, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_RF_AUC = cross_val_score(RF, X_train, y_train, cv = CV, scoring = 'roc_auc')

Model 3: KNN

In [121]:

KNN_S = KNeighborsClassifier()
params_KNN = {'n_neighbors': list(range(1,20))}
grid_KNN = RandomizedSearchCV(KNN_S, param_distributions=params_KNN, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_KNN.fit(X_train, y_train)
print('Best parameters:', grid_KNN.best_estimator_)

Best parameters: KNeighborsClassifier(n_neighbors=1)

In [122]:

KNN = KNeighborsClassifier(n_neighbors=1)
cross_val_KNN_Acc = cross_val_score(KNN, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_KNN_f1 = cross_val_score(KNN, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_KNN_AUC = cross_val_score(KNN, X_train, y_train, cv = CV, scoring = 'roc_auc')

Model 4: Decision Tree

In [123]:

DT_S = DecisionTreeClassifier(random_state=42)
params_DT = {'min_samples_leaf': list(range(1,6)), 'min_samples_split': list(range(1,6))}
grid_DT = RandomizedSearchCV(DT_S, param_distributions=params_DT, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_DT.fit(X_train, y_train)
print('Best parameters:', grid_DT.best_estimator_)

Best parameters: DecisionTreeClassifier(random_state=42)

In [124]:

DT = DecisionTreeClassifier(random_state=42)
cross_val_DT_Acc = cross_val_score(DT, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_DT_f1 = cross_val_score(DT, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_DT_AUC = cross_val_score(DT, X_train, y_train, cv = CV, scoring = 'roc_auc')

Model 5: Ada Boost

In [125]:

AB_S = AdaBoostClassifier(random_state=42)
params_AB = {'n_estimators': list(np.arange(50,100,10)), 'learning_rate':[0.01, 0.1, 1]}
grid_AB = RandomizedSearchCV(AB_S, param_distributions=params_AB, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_AB.fit(X_train, y_train)
print('Best parameters:', grid_AB.best_estimator_)

Best parameters: AdaBoostClassifier(learning_rate=1, n_estimators=90, random_state=42)

In [126]:

AB = AdaBoostClassifier(learning_rate=1, n_estimators=90, random_state=42)
cross_val_AB_Acc = cross_val_score(AB, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_AB_f1 = cross_val_score(AB, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_AB_AUC = cross_val_score(AB, X_train, y_train, cv = CV, scoring = 'roc_auc')

Model 6: XG Boost

In [127]:

XG_S = XGBClassifier(random_state=42)
params_XG = {'n_estimators': list(np.arange(50,150,10)), 'learning_rate':[0.01, 0.1, 1]}
grid_XG = RandomizedSearchCV(XG_S, param_distributions=params_XG, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_XG.fit(X_train, y_train)
print('Best parameters:', grid_XG.best_estimator_)

Best parameters: XGBClassifier(learning_rate=1, n_estimators=130, random_state=42)

In [128]:

XG = XGBClassifier(learning_rate=1, n_estimators=120, random_state=42)
cross_val_XG_Acc = cross_val_score(XG, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_XG_f1 = cross_val_score(XG, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_XG_AUC = cross_val_score(XG, X_train, y_train, cv = CV, scoring = 'roc_auc')

Model 7: Extra Tree Classifier

In [129]:

ET_S = ExtraTreesClassifier(random_state=42)
params_ET = {'n_estimators': list(np.arange(50,150,10))}
grid_ET = RandomizedSearchCV(XG_S, param_distributions=params_ET, cv=5, n_jobs=-1, n_iter=20, random_state=42, return_train_score=True)
grid_ET.fit(X_train, y_train)
print('Best parameters:', grid_ET.best_estimator_)

Best parameters: XGBClassifier(n_estimators=140, random_state=42)

In [130]:

ET = ExtraTreesClassifier(n_estimators=140, random_state=42)
cross_val_ET_Acc = cross_val_score(ET, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_ET_f1 = cross_val_score(ET, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_ET_AUC = cross_val_score(ET, X_train, y_train, cv = CV, scoring = 'roc_auc')

Super Learner

In [131]:

SL = SuperLearner(folds=5, random_state=42)

In [132]:

SL.add([RF, XG, ET])

Out[132]:

SuperLearner(array_check=None, backend=None, folds=5,
       layers=[Layer(backend='threading', dtype=<class 'numpy.float32'>, n_jobs=-1,
   name='layer-1', propagate_features=None, raise_on_exception=True,
   random_state=7270, shuffle=False,
   stack=[Group(backend='threading', dtype=<class 'numpy.float32'>,
   indexer=FoldIndex(X=None, folds=5, raise_on_ex...rer=None)],
   n_jobs=-1, name='group-0', raise_on_exception=True, transformers=[])],
   verbose=0)],
       model_selection=False, n_jobs=None, raise_on_exception=True,
       random_state=42, sample_size=20, scorer=None, shuffle=False,
       verbose=False)

In [133]:

SL.add_meta(MLPClassifier())

Out[133]:

SuperLearner(array_check=None, backend=None, folds=5,
       layers=[Layer(backend='threading', dtype=<class 'numpy.float32'>, n_jobs=-1,
   name='layer-1', propagate_features=None, raise_on_exception=True,
   random_state=7270, shuffle=False,
   stack=[Group(backend='threading', dtype=<class 'numpy.float32'>,
   indexer=FoldIndex(X=None, folds=5, raise_on_ex...rer=None)],
   n_jobs=-1, name='group-1', raise_on_exception=True, transformers=[])],
   verbose=0)],
       model_selection=False, n_jobs=None, raise_on_exception=True,
       random_state=42, sample_size=20, scorer=None, shuffle=False,
       verbose=False)

In [134]:

cross_val_SL_Acc = cross_val_score(SL, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_SL_f1 = cross_val_score(SL, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_SL_AUC = cross_val_score(SL, X_train, y_train, cv = CV, scoring = 'roc_auc')

Stacking

In [135]:

estimators = [('DT', DT),
              ('RF', RF),
              ('ET', ET),
              ('LR', LR),
              ('KNN', KNN),
              ('XG', XG),
              ('AB', AB)]
              
Stack = StackingClassifier(estimators = estimators, final_estimator = MLPClassifier())

In [136]:

cross_val_ST_Acc = cross_val_score(Stack, X_train, y_train, cv = CV, scoring = 'accuracy') 
cross_val_ST_f1 = cross_val_score(Stack, X_train, y_train, cv = CV, scoring = 'f1')
cross_val_ST_AUC = cross_val_score(Stack, X_train, y_train, cv = CV, scoring = 'roc_auc')

¶

12. Feature Importance

Back to Table of Contents

What features contribute more to predict the target (Churn)? let's find out how useful they are at predicting the target variable.

Random Forest algorithm offer importance scores based on the reduction in the criterion used to select split points, like Gini or entropy.

In [153]:

RF_I = RandomForestClassifier(n_estimators=70, random_state=42)
RF_I.fit(X, y)

Out[153]:

RandomForestClassifier(n_estimators=70, random_state=42)

In [154]:

d = {'Features': X_train.columns, 'Feature Importance': RF_I.feature_importances_}
df = pd.DataFrame(d)
df_sorted = df.sort_values(by='Feature Importance', ascending = True)
df_sorted
df_sorted.style.background_gradient(cmap='Blues')

Out[154]:

	Features	Feature Importance
5	PhoneService	0.008060
1	SeniorCitizen	0.016236
3	Dependents	0.018789
2	Partner	0.022521
15	PaperlessBilling	0.023343
10	DeviceProtection	0.024419
6	MultipleLines	0.024435
9	OnlineBackup	0.025233
12	StreamingTV	0.027024
0	gender	0.027551
8	OnlineSecurity	0.028928
11	TechSupport	0.029647
13	StreamingMovies	0.033099
7	InternetService	0.034272
16	PaymentMethod	0.045543
14	Contract	0.098061
17	MonthlyCharges	0.159143
4	tenure	0.164164
18	TotalCharges	0.189532

In [155]:

fig = px.bar(x=df_sorted['Feature Importance'], y=df_sorted['Features'], color_continuous_scale=px.colors.sequential.Blues,
             title='<b>Feature Importance Based on Random Forest<b>', text_auto='.4f', color=df_sorted['Feature Importance'])

fig.update_traces(marker=dict(line=dict(color='black', width=2)))
fig.update_layout({'yaxis': {'title':'Features'}, 'xaxis': {'title':'Feature Importance'}})

iplot(fig)

¶

13. Results

Back to Table of Contents

In [156]:

compare_models = [('Logistic Regression', cross_val_LR_Acc.mean(),cross_val_LR_f1.mean(),cross_val_LR_AUC.mean(), ''),
                  ('Random Forest', cross_val_RF_Acc.mean(),cross_val_RF_f1.mean(),cross_val_RF_AUC.mean(), ''),
                  ('KNN', cross_val_KNN_Acc.mean(),cross_val_KNN_f1.mean(),cross_val_KNN_AUC.mean(), ''),
                  ('Decision Tree', cross_val_DT_Acc.mean(), cross_val_DT_f1.mean(),cross_val_DT_AUC.mean(), ''),
                  ('Ada Boost', cross_val_AB_Acc.mean(), cross_val_AB_f1.mean(),cross_val_AB_AUC.mean(), ''),
                  ('XG Boost', cross_val_XG_Acc.mean(), cross_val_XG_f1.mean(),cross_val_XG_AUC.mean(), ''),
                  ('Extra Tree', cross_val_ET_Acc.mean(), cross_val_ET_f1.mean(),cross_val_ET_AUC.mean(), ''),
                  ('Super Learner', cross_val_SL_Acc.mean(), cross_val_SL_f1.mean(),cross_val_SL_AUC.mean(), ''),
                  ('Stacking', cross_val_ST_Acc.mean(), cross_val_ST_f1.mean(),cross_val_ST_AUC.mean(), 'best model')]

In [157]:

compare = pd.DataFrame(data = compare_models, columns=['Model','Accuracy Mean', 'F1 Score Mean', 'AUC Score Mean', 'Description'])
compare.style.background_gradient(cmap='YlGn')

Out[157]:

	Model	Accuracy Mean	F1 Score Mean	AUC Score Mean	Description
0	Logistic Regression	0.746267	0.759828	0.825137
1	Random Forest	0.834962	0.841735	0.915320
2	KNN	0.772686	0.785045	0.772846
3	Decision Tree	0.782881	0.786659	0.783714
4	Ada Boost	0.769136	0.781800	0.846343
5	XG Boost	0.797485	0.803550	0.878836
6	Extra Tree	0.827876	0.832871	0.908827
7	Super Learner	0.833995	0.840168	nan
8	Stacking	0.841942	0.844890	0.921315	best model

In [158]:

d1 = {'Logistic Regression':cross_val_LR_Acc, 'Random Forest':cross_val_RF_Acc, 'KNN':cross_val_KNN_Acc, 'Decision Tree':cross_val_DT_Acc,
     'Ada Boost':cross_val_AB_Acc, 'XG Boost':cross_val_XG_Acc, 'Extra Tree':cross_val_ET_Acc, 'Super Learner':cross_val_SL_Acc,
     'Stacking':cross_val_ST_Acc}
d_accuracy = pd.DataFrame(data = d1)

In [159]:

d2 = {'Logistic Regression':cross_val_LR_f1, 'Random Forest':cross_val_RF_f1, 'KNN':cross_val_KNN_f1, 'Decision Tree':cross_val_DT_f1,
     'Ada Boost':cross_val_AB_f1, 'XG Boost':cross_val_XG_f1, 'Extra Tree':cross_val_ET_f1, 'Super Learner':cross_val_SL_f1,
     'Stacking':cross_val_ST_f1}
d_f1 = pd.DataFrame(data = d2)

In [160]:

d3 = {'Logistic Regression':cross_val_LR_AUC, 'Random Forest':cross_val_RF_AUC, 'KNN':cross_val_KNN_AUC, 'Decision Tree':cross_val_DT_AUC,
     'Ada Boost':cross_val_AB_AUC, 'XG Boost':cross_val_XG_AUC, 'Extra Tree':cross_val_ET_AUC, 'Super Learner':cross_val_SL_AUC,
     'Stacking':cross_val_ST_AUC}
d_auc = pd.DataFrame(data = d3)

In [161]:

fig = go.Figure()
fig.add_trace(go.Box(name='Logistic Regression', y=d_accuracy.iloc[:,0]))
fig.add_trace(go.Box(name='Random Forest', y=d_accuracy.iloc[:,1]))
fig.add_trace(go.Box(name='KNN', y=d_accuracy.iloc[:,2]))
fig.add_trace(go.Box(name='Decision Tree', y=d_accuracy.iloc[:,3]))
fig.add_trace(go.Box(name='Ada Boost', y=d_accuracy.iloc[:,4]))
fig.add_trace(go.Box(name='XG Boost', y=d_accuracy.iloc[:,5]))
fig.add_trace(go.Box(name='Extra Tree', y=d_accuracy.iloc[:,6]))
fig.add_trace(go.Box(name='Super Learner', y=d_accuracy.iloc[:,7]))
fig.add_trace(go.Box(name='Stacking', y=d_accuracy.iloc[:,8]))

fig.update_traces(boxpoints='all', boxmean=True)

fig.update_layout(title_text='<b>Box Plots for Models Accuracy (train)<b>')

iplot(fig)

In [162]:

fig = go.Figure()
fig.add_trace(go.Box(name='Logistic Regression', y=d_f1.iloc[:,0]))
fig.add_trace(go.Box(name='Random Forest', y=d_f1.iloc[:,1]))
fig.add_trace(go.Box(name='KNN', y=d_f1.iloc[:,2]))
fig.add_trace(go.Box(name='Decision Tree', y=d_f1.iloc[:,3]))
fig.add_trace(go.Box(name='Ada Boost', y=d_f1.iloc[:,4]))
fig.add_trace(go.Box(name='XG Boost', y=d_f1.iloc[:,5]))
fig.add_trace(go.Box(name='Extra Tree', y=d_f1.iloc[:,6]))
fig.add_trace(go.Box(name='Super Learner', y=d_f1.iloc[:,7]))
fig.add_trace(go.Box(name='Stacking', y=d_f1.iloc[:,8]))

fig.update_traces(boxpoints='all', boxmean=True)

fig.update_layout(title_text='<b>Box Plots for Models F1 Score (train)<b>')

iplot(fig)

In [163]:

fig = go.Figure()
fig.add_trace(go.Box(name='Logistic Regression', y=d_auc.iloc[:,0]))
fig.add_trace(go.Box(name='Random Forest', y=d_auc.iloc[:,1]))
fig.add_trace(go.Box(name='KNN', y=d_auc.iloc[:,2]))
fig.add_trace(go.Box(name='Decision Tree', y=d_auc.iloc[:,3]))
fig.add_trace(go.Box(name='Ada Boost', y=d_auc.iloc[:,4]))
fig.add_trace(go.Box(name='XG Boost', y=d_auc.iloc[:,5]))
fig.add_trace(go.Box(name='Extra Tree', y=d_auc.iloc[:,6]))
fig.add_trace(go.Box(name='Stacking', y=d_auc.iloc[:,8]))

fig.update_traces(boxpoints='all', boxmean=True)

fig.update_layout(title_text='<b>Box Plots for Models AUC (train)<b>')

iplot(fig)

Stacking model is the most stable and accurate model. As a result, Stacking is selected for the purpose of predicting Churn.

In [164]:

Stack.fit(X_train, y_train)
y_pred = Stack.predict(X_test)

In [165]:

print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.86      0.83      0.85       505
           1       0.85      0.87      0.86       530

    accuracy                           0.85      1035
   macro avg       0.85      0.85      0.85      1035
weighted avg       0.85      0.85      0.85      1035

In [166]:

y_prob = Stack.predict_proba(X_test)
roc_auc_score(y_test, y_prob[:,1],average='macro')

Out[166]:

0.9302073603586775

In [167]:

fpr, tpr, thresholds = roc_curve(y_test, y_prob[:,1])

fig = px.area(
    x=fpr, y=tpr,
    title=f'<b>ROC Curve (AUC={auc(fpr, tpr):.4f})<b>',
    labels=dict(x='False Positive Rate', y='True Positive Rate'),
    width=700, height=500, color_discrete_sequence=['#DA598A'])

fig.add_shape(
    type='line', line=dict(dash='dash'),
    x0=0, x1=1, y0=0, y1=1
)

fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.update_xaxes(constrain='domain')
iplot(fig)

In [168]:

cm = confusion_matrix(y_test, y_pred)
cm = cm.astype(int)
       
fig = ff.create_annotated_heatmap(z=cm[::-1], x=['No','Yes'], y=['Yes', 'No'], colorscale='Blues', annotation_text=cm[::-1]) 

fig.update_layout(title_text='<b>Confusion Matrix of Stacking Model<b>',
                  xaxis_title = 'Predicted value', yaxis_title = 'Real value', width=800, height=500)

iplot(fig)

We achieved about $86\%$ accuracy on the test.

Customer churn is definitely bad to a firm ’s profitability. Various strategies can be implemented to eliminate customer churn. The best way to avoid customer churn is for a company to truly know its customers. This includes identifying customers who are at risk of churning and working to improve their satisfaction. Improving customer service is, of course, at the top of the priority for tackling this issue. Building customer loyalty through relevant experiences and specialized service is another strategy to reduce customer churn. Some firms survey customers who have already churned to understand their reasons for leaving in order to adopt a proactive approach to avoiding future customer churn.